Oracle索引扫描算法

SQL> create table t as select * from dba_objects;  
  
Table created.  
  
SQL> create index idx_t on t(object_id);  
  
Index created.  


SQL> BEGIN  
  2    DBMS_STATS.GATHER_TABLE_STATS(ownname          => 'TEST',  
  3                                  tabname          => 'T',  
  4                                  estimate_percent => 100,  
  5                                  method_opt       => 'for all columns size auto',  
  6                                  degree           => DBMS_STATS.AUTO_DEGREE,  
  7                                  cascade          => TRUE);  
  8  END;  
  9  /  

SQL> select leaf_blocks,blevel,clustering_factor from dba_indexes where index_name='IDX_T';  

LEAF_BLOCKS	BLEVEL CLUSTERING_FACTOR
----------- ---------- -----------------
	165	     1		    1705



LEAF_BLOCKS 叶子块 165个

BLEVEL  索引高度-1


集群因子；
CLUSTERING_FACTOR =1705


SQL> select count(distinct dbms_rowid.rowid_block_number(rowid)) from T;

COUNT(DISTINCTDBMS_ROWID.ROWID_BLOCK_NUMBER(ROWID))
---------------------------------------------------
					       1057

存储在1057个块中

SQL> set linesize 200
SQL> select b.num_rows,
       a.num_distinct,
       a.num_nulls,
       utl_raw.cast_to_number(high_value) high_value,
       utl_raw.cast_to_number(low_value) low_value,
       (b.num_rows - a.num_nulls) "NUM_ROWS-NUM_NULLS",
       utl_raw.cast_to_number(high_value) -
       utl_raw.cast_to_number(low_value) "HIGH_VALUE-LOW_VALUE"
  from dba_tab_col_statistics a, dba_tables b
 where a.owner = b.owner
   and a.table_name = b.table_name
   and a.owner = 'TEST'
   and a.table_name = upper('T')
   and a.column_name = 'OBJECT_ID';  2    3    4    5    6    7    8    9   10   11   12   13   14  

  NUM_ROWS NUM_DISTINCT  NUM_NULLS HIGH_VALUE  LOW_VALUE NUM_ROWS-NUM_NULLS HIGH_VALUE-LOW_VALUE
---------- ------------ ---------- ---------- ---------- ------------------ --------------------
     74486	  74486 	 0	77616	       2	      74486		   77614



SQL> explain plan for select owner from t where object_id<1000;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------
Plan hash value: 1594971208

-------------------------------------------------------------------------------------
| Id  | Operation		    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |	    |	958 | 10538 |	 26   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |	958 | 10538 |	 26   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN	    | IDX_T |	958 |	    |	  4   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------
---------------------------------------------------

   2 - access("OBJECT_ID"<1000)

14 rows selected.


索引扫描首先要定义到叶子块:


定位到叶子块 要扫描 多少个块？？？  需要高度-1个块

叶子块个数 乘以 选择性

定位到叶子块 要扫描 多少个块？？？

回表和集群因子有关:

选择性(Selectivity) 列唯一键(Distinct_Keys) 与行数(Num_Rows)的比值。


这里有个概念叫有效选择性 ，< 的有效选择性为

(limit-low_value)/(high_value-low_value)


limit 是限制
1000

low_value=2

1000-2 有可能扫到的值的范围


high_value-low_value  表示总共有多少个值:

HIGH_VALUE=77616

LOW_VALUE=2

HIGH_VALUE-LOW_VALUE=77614

LEAF_BLOCKS=165


索引扫描的计算公式如下：
cost =  
 blevel +  
 celiling(leaf_blocks *effective index selectivity) +  
 celiling(clustering_factor * effective table selectivity)


SQL> select 1+ceil(165*(1000-2)/77614)+ceil(1705*(1000-2)/77614) from dual; 

1+CEIL(165*(1000-2)/77614)+CEIL(1705*(1000-2)/77614)
----------------------------------------------------
						  26


为啥effective table selectivity和effective index selectivity一样？

表和索引都包含指定列的数据 两者当然一样