组合索引怎么应该怎么选取引导列？

有这样一个SQL
select count(*) from t1,t2 where t1.id=t2.id and t1.owner='SCOTT';
id列选择性很高，owner选择性很低
要优化它很简单，只需要在t1表上建立一个组合索引(owner,id)，在t2表上建立一个索引(id)
现在要讨论的是我们应该怎么建立组合索引，哪一列(owner,id)应该放在最前面？

现在来做个实验

SQL> desc t1
Name        Type          Nullable Default Comments 
----------- ------------- -------- ------- -------- 
ID          NUMBER        Y                         
OBJECT_NAME VARCHAR2(128) Y                         
OWNER       VARCHAR2(30)  Y                         

SQL> desc t2
Name      Type        Nullable Default Comments 
--------- ----------- -------- ------- -------- 
ID        NUMBER      Y                         
STATUS    VARCHAR2(7) Y                         
TEMPORARY VARCHAR2(1) Y    


SQL> create index inx_id on t2(id);

Index created.

SQL> create index inx_id_owner on t1(id,owner);

Index created.

SQL> select count(*) from t1,t2 where t1.id=t2.id and t1.owner='SCOTT';

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
Plan hash value: 2432674005

---------------------------------------------------------------------------------------
| Id  | Operation              | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |              |     1 |    16 |    88   (2)| 00:00:02 |
|   1 |  SORT AGGREGATE        |              |     1 |    16 |            |          |
|*  2 |   HASH JOIN            |              |  2416 | 38656 |    88   (2)| 00:00:02 |
|*  3 |    INDEX FAST FULL SCAN| INX_ID_OWNER |  2416 | 26576 |    50   (0)| 00:00:01 |
|   4 |    INDEX FAST FULL SCAN| INX_ID       | 72475 |   353K|    37   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."ID"="T2"."ID")
   3 - filter("T1"."OWNER"='SCOTT')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        392  consistent gets
          0  physical reads
          0  redo size
        422  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
                      
SQL> create index inx_owner_id on t1(owner,id);

Index created.

SQL> select count(*) from t1,t2 where t1.id=t2.id and t1.owner='SCOTT';

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
Plan hash value: 277464349

---------------------------------------------------------------------------------------
| Id  | Operation              | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |              |     1 |    16 |    47   (3)| 00:00:01 |
|   1 |  SORT AGGREGATE        |              |     1 |    16 |            |          |
|*  2 |   HASH JOIN            |              |  2416 | 38656 |    47   (3)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN    | INX_OWNER_ID |  2416 | 26576 |     9   (0)| 00:00:01 |
|   4 |    INDEX FAST FULL SCAN| INX_ID       | 72475 |   353K|    37   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."ID"="T2"."ID")
   3 - access("T1"."OWNER"='SCOTT')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        169  consistent gets
          0  physical reads
          0  redo size
        422  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

如果OWNER作为引导列，逻辑读由以前的392变成现在的169，并且由以前的 index fast full scan 变成index range scan

如果强制指定走索引 inx_id_owner 
          
SQL>  select /*+ index(t1 inx_id_owner) */ count(*) from t1,t2 where t1.id=t2.id and t1.owner='SCOTT';

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
Plan hash value: 3161475902

---------------------------------------------------------------------------------------
| Id  | Operation              | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |              |     1 |    16 |   259   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE        |              |     1 |    16 |            |          |
|*  2 |   HASH JOIN            |              |  2416 | 38656 |   259   (1)| 00:00:04 |
|*  3 |    INDEX FULL SCAN     | INX_ID_OWNER |  2416 | 26576 |   221   (1)| 00:00:03 |
|   4 |    INDEX FAST FULL SCAN| INX_ID       | 72475 |   353K|    37   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."ID"="T2"."ID")
   3 - access("T1"."OWNER"='SCOTT')
       filter("T1"."OWNER"='SCOTT')


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        387  consistent gets
          0  physical reads
          0  redo size
        422  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

依然要387个逻辑读。

为什么要以owner为引导列？因为ID是join列，并且ID列上面没过滤条件，如果以ID列作为引导列，由于没过滤条件
那么CBO只能走 index full scan,或者index fast full scan，因为引导列没过滤条件，走不了index range scan,
最多走index skip scan，不过index skip scan代价过高，因为index skip scan要求 引导列选择性很低，但是ID这里选择性很高

SQL> select /*+ index_ss(t1 inx_id_owner) */ count(*) from t1,t2 where t1.id=t2.id and t1.owner='SYS';

Elapsed: 00:00:00.10

Execution Plan
----------------------------------------------------------
Plan hash value: 3493079762

---------------------------------------------------------------------------------------
| Id  | Operation              | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |              |     1 |    16 | 72529   (1)| 00:14:31 |
|   1 |  SORT AGGREGATE        |              |     1 |    16 |            |          |
|*  2 |   HASH JOIN            |              |  2416 | 38656 | 72529   (1)| 00:14:31 |
|*  3 |    INDEX SKIP SCAN     | INX_ID_OWNER |  2416 | 26576 | 72491   (1)| 00:14:30 |
|   4 |    INDEX FAST FULL SCAN| INX_ID       | 72475 |   353K|    37   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."ID"="T2"."ID")
   3 - access("T1"."OWNER"='SYS')
       filter("T1"."OWNER"='SYS')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        387  consistent gets
          0  physical reads
          0  redo size
        424  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

如果owner列作为引导列，那么优化器就可以选择index range scan,这样相比index full scan, index fast full scan
肯定要少扫描很多leaf block,逻辑读就会相对较少。

其实到这里，是否可以总结一下建立组合索引的原则呢？
引导列要选择过滤条件的列作为引导列，比如 where a.xxx='xxx' 或者 a.xxx> 或者 a.xxx<
引导列的选择性越高越好，因为选择性越高，扫描的leaf block就越少，效率就越高
尽量把join列放到组合索引最后面

这里选择以owner列作为引导列，由于owner选择性很低，所以测试索引压缩对于性能的提升

SQL> analyze  index inx_owner_id validate structure;

Index analyzed.

SQL> select height,
  2         blocks,
  3         lf_blks,
  4         br_blks,
  5         OPT_CMPR_COUNT,
  6         OPT_CMPR_PCTSAVE
  7    from index_stats
  8   where name = 'INX_OWNER_ID';

    HEIGHT     BLOCKS    LF_BLKS    BR_BLKS OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
---------- ---------- ---------- ---------- -------------- ----------------
         2        256        219          1              1               26
 
SQL> drop index inx_owner_id;

Index dropped

SQL> create index inx_owner_id on t1(owner,id) compress 1;

Index created

SQL> analyze  index inx_owner_id validate structure;

Index analyzed

SQL> 
SQL> select height,
  2         blocks,
  3         lf_blks,
  4         br_blks,
  5         OPT_CMPR_COUNT,
  6         OPT_CMPR_PCTSAVE
  7    from index_stats
  8   where name = 'INX_OWNER_ID';

    HEIGHT     BLOCKS    LF_BLKS    BR_BLKS OPT_CMPR_COUNT OPT_CMPR_PCTSAVE
---------- ---------- ---------- ---------- -------------- ----------------
         2        256        161          1              1                0
         
索引压缩之后，Leaf block 由原来的219降低到161个，节约了58个block 现在再来看一看执行计划+统计信息

SQL> select count(*) from t1,t2 where t1.id=t2.id and t1.owner='SCOTT';

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
Plan hash value: 277464349

---------------------------------------------------------------------------------------
| Id  | Operation              | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |              |     1 |    16 |    45   (3)| 00:00:01 |
|   1 |  SORT AGGREGATE        |              |     1 |    16 |            |          |
|*  2 |   HASH JOIN            |              |  2416 | 38656 |    45   (3)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN    | INX_OWNER_ID |  2416 | 26576 |     7   (0)| 00:00:01 |
|   4 |    INDEX FAST FULL SCAN| INX_ID       | 72475 |   353K|    37   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."ID"="T2"."ID")
   3 - access("T1"."OWNER"='SCOTT')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        169  consistent gets
          0  physical reads
          0  redo size
        422  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

由此可见，索引压缩之后，逻辑读并没有下降，还是169，但是索引的leaf blcok显著减少了，这样减少了存储空间，能降低物理IO