intra blcok chain（转）

转载自love wife & love life —Roger 提供oracle技术支持服务

最近在一个优化项目中，通过awr报告发现table fetch continued row 指标很高,怀疑是行迁移/链接比较严重。
后来经过沟通发现，原来客户的数据库中存在几个table，其column 数目超过255. 针对超过255列的行数据.如下：

Statistic Total per Second per Trans

dirty buffers inspected 7,532 4.17 0.25

free buffer inspected 17,409,018 9,633.69 575.56

free buffer requested 17,115,682 9,471.36 565.86

table fetch by rowid 98,848,588 54,700.18 3,268.05

table fetch continued row 97,797,107 54,118.32 3,233.28

oracle会将其每行数据都分成2个row piece. 实际上如果超过510个列，那么会被分成3个row piece存放在同一个
block中。 10gR2的官方文档是这样描述的：

"When a table has more than 255 columns, rows that have data after the 255th column are likely to

be chained within the same block. This is called intra-block chaining. A chained row's pieces are

chained together using the rowids of the pieces. With intra-block chaining, users receive all the

data in the same block. If the row fits in the block, users do not see an effect in I/O

performance, because no extra I/O operation is required to retrieve the rest of the row."

创建超过255列的测试表：

----tab1 256 columns

SQL> declare

2 v_sql varchar2(32767) ;

3 begin

4 v_sql := 'create table t_chain1 ( ' ;

5 for i in 1..256 loop

6 v_sql := v_sql || 'id'||i||' number,' ;

7 end loop ;

8 v_sql := rtrim(v_sql, ',') || ')';

9 execute immediate v_sql;

10 end ;

11 /

PL/SQL procedure successfully completed.

SQL> select count(1) from user_tab_columns where table_name='T_CHAIN1';

COUNT(1)

----------

256

SQL> create sequence t_chain_seq

2 minvalue 1

3 nomaxvalue

4 start with 1

5 increment by 1

6 nocycle

7 cache 10000;

Sequence created.

SQL>

SQL> insert into t_chain1(id256) values(t_chain_seq.NEXTVAL);

1 row created.

SQL> /

1 row created.

SQL> /

1 row created.

SQL> commit;

Commit complete.

SQL> select count(1) from t_chain1;

COUNT(1)

----------

3

SQL>

再创建一个255列的测试表进行对比：

----tab2 255 columns

SQL> declare

2 v_sql varchar2(32767) ;

3 begin

4 v_sql := 'create table t_chain2 ( ' ;

5 for i in 1..255 loop

6 v_sql := v_sql || 'id'||i||' number,' ;

7 end loop ;

8 v_sql := rtrim(v_sql, ',') || ')';

9 execute immediate v_sql;

10 end ;

11 /

PL/SQL procedure successfully completed.

SQL> select count(1) from user_tab_columns where table_name='T_CHAIN2';

COUNT(1)

----------

255

SQL> create sequence t_chain_seq2

2 minvalue 1

3 nomaxvalue

4 start with 1

5 increment by 1

6 nocycle

7 cache 10000;

Sequence created.

SQL> insert into t_chain2(id255) values(t_chain_seq2.NEXTVAL);

1 row created.

SQL> /

1 row created.

SQL> /

1 row created.

SQL> commit;

Commit complete.

SQL> select count(1) from t_chain2;

COUNT(1)

----------

3

通过前面的测试表，我们来对比观察下这2个表的block信息的差别：

----使用dbms_rowid定位到block 号

SQL> select dbms_rowid.rowid_relative_fno(t.rowid) as "file#" ,

2 dbms_rowid.rowid_block_number(t.rowid) as "block#" from t_chain1 t ;

file# block#

---------- ----------

2 73688

SQL> select dbms_rowid.rowid_relative_fno(t.rowid) as "file#" ,

2 dbms_rowid.rowid_block_number(t.rowid) as "block#" from t_chain2 t ;

file# block#

---------- ----------

2 73696

我们先来看下这2种情况下，block内信息的差异.

---tab1 256 column dump

data_block_dump,data header at 0xd0da87c

===============

tsiz: 0x1f80

hsiz: 0x1e

pbl: 0x0d0da87c

bdba: 0x00811fd8

76543210

flag=--------

ntab=1

nrow=6 ---注意这里是6(实际上测试表的数据只有3条)

frre=-1

fsbo=0x1e

fseo=0x1c56

avsp=0x1c38

tosp=0x1c38

0xe:pti[0] nrow=6 offs=0

0x12:pri[0] offs=0x1e7c

0x14:pri[1] offs=0x1e72

0x16:pri[2] offs=0x1d6e

0x18:pri[3] offs=0x1d64

0x1a:pri[4] offs=0x1c60

0x1c:pri[5] offs=0x1c56

block_row_dump:

tab 0, row 0, @0x1e7c 我们可以看到,oracle这样讲第1行数据分成了2个row piece.这里是第一个row piece的offset

tl: 260 fb: -----L-- lb: 0x1 cc: 255

col 0: *NULL*

col 1: *NULL*

col 2: *NULL*

col 3: *NULL*

.......

col 253: *NULL*

col 254: [ 2] c1 02 从这里我们可以看到，oracle将有数据的列存放到第一个row piece了.

tab 0, row 1, @0x1e72 这里是第一行数据的第2个row piece.

tl: 10 fb: --H-F--- lb: 0x1 cc: 1

nrid: 0x00811fd8.0 这里的nrid相当于rdba，转换后即为我们的file 2 block 73688地址.

col 0: *NULL*

tab 0, row 2, @0x1d6e

tl: 260 fb: -----L-- lb: 0x1 cc: 255

col 0: *NULL*

col 1: *NULL*

col 2: *NULL*

col 3: *NULL*

......

col 252: *NULL*

col 253: *NULL*

col 254: [ 2] c1 03

tab 0, row 3, @0x1d64

tl: 10 fb: --H-F--- lb: 0x1 cc: 1

nrid: 0x00811fd8.2

col 0: *NULL*

tab 0, row 4, @0x1c60

tl: 260 fb: -----L-- lb: 0x1 cc: 255

col 0: *NULL*

col 1: *NULL*

col 2: *NULL*

col 3: *NULL*

......

col 252: *NULL*

col 253: *NULL*

col 254: [ 2] c1 04

tab 0, row 5, @0x1c56

tl: 10 fb: --H-F--- lb: 0x1 cc: 1

nrid: 0x00811fd8.4

col 0: *NULL*

end_of_block_dump

对于超过255列的表，oracle会将其在block内的一行数据分成2个row piece来存放。但是会将有数据的列存放在
第一个row piece中,无数的列存放的在第2个row piece中.

如下是tab2 255 columns的dump：

data_block_dump,data header at 0xd0da864

===============

tsiz: 0x1f98

hsiz: 0x18

pbl: 0x0d0da864

bdba: 0x00811fe0

76543210

flag=--------

ntab=1

nrow=3

frre=-1

fsbo=0x18

fseo=0x1c8c

avsp=0x1c74

tosp=0x1c74

0xe:pti[0] nrow=3 offs=0

0x12:pri[0] offs=0x1e94

0x14:pri[1] offs=0x1d90

0x16:pri[2] offs=0x1c8c

block_row_dump:

tab 0, row 0, @0x1e94

tl: 260 fb: --H-FL-- lb: 0x1 cc: 255

col 0: *NULL*

col 1: *NULL*

col 2: *NULL*

......

col 253: *NULL*

col 254: [ 2] c1 02

tab 0, row 1, @0x1d90

tl: 260 fb: --H-FL-- lb: 0x1 cc: 255

col 0: *NULL*

col 1: *NULL*

col 2: *NULL*

.......

col 253: *NULL*

col 254: [ 2] c1 03

tab 0, row 2, @0x1c8c

tl: 260 fb: --H-FL-- lb: 0x1 cc: 255

col 0: *NULL*

col 1: *NULL*

col 2: *NULL*

col 3: *NULL*

........

col 253: *NULL*

col 254: [ 2] c1 04

end_of_block_dump

对于未超过255列的表.没有什么特别的，正是大家平时所看到的这样.

这里给大家补充一点，为什么oracle这里最大允许255个列呢 ？ 如下：

BBED> x /rnnnnnnnnnnnnnnnnnnnn

rowdata[6686] @7924

-------------

flag@7924: 0x04 (KDRHFL)

lock@7925: 0x00

cols@7926: 255

.......

BBED> d /v offset 7924 count 10

File: /home/ora10g/oradata/roger01.dbf (2)

Block: 73688 Offsets: 7924 to 7933 Dba:0x00811fd8

-------------------------------------------------------

d80000ff 0401ffff ffff l ........

<16 bytes per line>

可以看到oracle用了一个byte来存放column 的count值。简称cc.

我们知道，1个byte等于8个bit.  一个bit最大表示的数目是power(2,1).以此类推，那么一个byte
所能表示的最大数目都是power(2,8),即使256.  所以oracle这里一个row piece最大允许255个column.

超过255列即为分到另外一个row piece中.
虽然数据被划分到了其他的row piece，然而，一行数据仍然是存在同一个block中. 同时，每行都存在一个rowid.

SQL> select rowid from t_chain1;

ROWID

------------------

AAAPJfAACAAAR/YAAB

AAAPJfAACAAAR/YAAD

AAAPJfAACAAAR/YAAF

SQL> select dbms_rowid.rowid_object(rowid) obj#,

2 dbms_rowid.rowid_relative_fno(rowid) rfile#,

3 dbms_rowid.rowid_block_number(rowid) block#,

4 dbms_rowid.rowid_row_number(rowid) row#

5 from t_chain1 ;

OBJ# RFILE# BLOCK# ROW#

---------- ---------- ---------- ----------

62047 2 73688 1

62047 2 73688 3

62047 2 73688 5

我们可以看到，虽然该block存放了6行数据(3条数据,6个row piece)，然而其rowid只有3个. 从这点也可以看出，其实虽然一条数据
被分成2个row piece，然而其rowid却只有一个.所有如果是所有通过rowid访问，那么是可以指定返回整条数据的，不需要产生额外的IO。
那么，针对这样的行内迁移(intra block chain)，到底是否会产生多余的IO消耗呢 ？我们通过创建通过如下的SQL来简单测试下：

+++++++首先测试不存在行迁移的表，观察逻辑读消耗

SQL> conn roger/roger

Connected.

SQL> select a.NAME, b.VALUE

2 from v$statname a, v$mystat b

3 where a.STATISTIC# = b.STATISTIC#

4 and lower(a.NAME) = 'table fetch continued row';

NAME VALUE

---------------------------------------------------------------- ----------

table fetch continued row 0

SQL> alter system flush shared_pool;

System altered.

SQL> alter system flush BUFFER_CACHE;

System altered.

SQL> set autot on

SQL> set lines 200

SQL> select count(1) from t_chain2 where id255=2;

COUNT(1)

----------

1

Execution Plan

----------------------------------------------------------

Plan hash value: 1667017148

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 3 | | |

|* 2 | TABLE ACCESS FULL| T_CHAIN2 | 1 | 3 | 3 (0)| 00:00:01 |

-------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - filter("ID255"=2)

Statistics

----------------------------------------------------------

476 recursive calls

0 db block gets

41 consistent gets

17 physical reads

0 redo size

411 bytes sent via SQL*Net to client

400 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

4 sorts (memory)

0 sorts (disk)

1 rows processed

++++++++ 测试intra block chaining的表

SQL> select count(1) from t_chain1 where id256=2;

COUNT(1)

----------

1

SQL> select a.NAME, b.VALUE

2 from v$statname a, v$mystat b

3 where a.STATISTIC# = b.STATISTIC#

4 and lower(a.NAME) = 'table fetch continued row';

NAME VALUE

---------------------------------------------------------------- ----------

table fetch continued row 48

SQL> alter system flush buffer_cache;

System altered.

SQL> set autot on

SQL> set autot off

SQL> select a.NAME, b.VALUE

2 from v$statname a, v$mystat b

3 where a.STATISTIC# = b.STATISTIC#

4 and lower(a.NAME) = 'table fetch continued row';

NAME VALUE

---------------------------------------------------------------- ----------

table fetch continued row 48

SQL> set autot on

SQL> select count(1) from t_chain1 where id256=2;

COUNT(1)

----------

1

Execution Plan

----------------------------------------------------------

Plan hash value: 432753352

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 3 | | |

|* 2 | TABLE ACCESS FULL| T_CHAIN1 | 1 | 3 | 3 (0)| 00:00:01 |

-------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - filter("ID256"=2)

Statistics

----------------------------------------------------------

1 recursive calls

0 db block gets

10 consistent gets

6 physical reads

0 redo size

411 bytes sent via SQL*Net to client

400 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

SQL> alter system flush buffer_cache;

System altered.

SQL>

SQL> set autot off

SQL> select a.NAME, b.VALUE

2 from v$statname a, v$mystat b

3 where a.STATISTIC# = b.STATISTIC#

4 and lower(a.NAME) = 'table fetch continued row';

NAME VALUE

---------------------------------------------------------------- ----------

table fetch continued row 63

SQL> alter system flush BUFFER_CACHE;

System altered.

SQL> set autot on

SQL> select count(1) from t_chain1 where id256=2;

COUNT(1)

----------

1

Execution Plan

----------------------------------------------------------

Plan hash value: 432753352

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 |

| 1 | SORT AGGREGATE | | 1 | 3 | | |

|* 2 | TABLE ACCESS FULL| T_CHAIN1 | 1 | 3 | 3 (0)| 00:00:01 |

-------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

2 - filter("ID256"=2)

Statistics

----------------------------------------------------------

477 recursive calls

0 db block gets

38 consistent gets

18 physical reads

0 redo size

411 bytes sent via SQL*Net to client

400 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

4 sorts (memory)

0 sorts (disk)

1 rows processed

SQL> select a.NAME, b.VALUE

2 from v$statname a, v$mystat b

3 where a.STATISTIC# = b.STATISTIC#

4 and lower(a.NAME) = 'table fetch continued row';

NAME VALUE

---------------------------------------------------------------- ----------

table fetch continued row 87

我们可以看到虽然intra block chain，会由于对表的访问导致table fetch continued row 统计信息的增加，
然而通过测试我们可以发现,其本身并不会消耗额外的IO。

从我上面的2个测试来看：

t_chain1    256 列
t_chain2    255 列

这2个测试表均包含3条数据。 全表扫描的逻辑读几乎一致。

---------------------------------------------------------------

以上是roger的博文，我基本照做了一遍，没有问题，当中有一个概念

对于超过255列的表，oracle会将其在block内的一行数据分成2个row piece来存放。但是会将有数据的列存放在
第一个row piece中,无数的列存放的在第2个row piece中.

这个概念有误，oracle用逆向存储数据的方式，即最新的数据存在最前段，所以即使是行链，也不是这种情况。（老杨说）

导出块的方式 alter system dump datafile 2 block 73688

nrid: 0x00811fd8.0 是对应的rdba，那是怎么转变的呢？

这里先把nrid转换成数字

　　　　SQL> select to_number('00811fd8','xxxxxxxx') from dual;

　　　　TO_NUMBER('00811FD8','XXXXXXXX

　　　　------------------------------

8462296

　　　　执行dbms_utility.data_block_address_block(8462296)

　　　　dbms_utility.data_block_address_file(8462296)

　　　　得到相应的值


ps：我收藏这篇博文的目的是因为，我比较缺乏这种测试的想法，个人觉得这篇博文，知识点还有测试方法都不错，所以做一个记录