跨resetlogs不完全恢复- oracle自动reset错误incarnation

在做oracle跨resetlogs的不完全恢复时，如果未删除flashback area的控制文件自动备份，那么Oracle可能会从控制文件自动备份中自动探测到incarnation信息，并重置incarnation，从而导致恢复报错。(不完全恢复测试一)

在做oracle跨resetlogs的不完全恢复时，如果需要catalog归档日志，注意不要catalog不相关的归档日志，否则Oracle可能会从归档日志中自动探测到incarnation信息，并重置incarnation，从而导致恢复报错。(不完全恢复测试二)

1. 环境准备

1.1 开启flashback

检查数据库flashback是否开启，如果未开启，则参考相关文档开启flashback。

SQL> select flashback_on from v$database;

FLASHBACK_ON
------------------
YES

1.2 开启controlfile自动备份

如果开启了flashback database, 缺省情况下controlfile自动备份存储在flashback area里。

RMAN> show controlfile autobackup;

using target database control file instead of recovery catalog
RMAN configuration parameters are:
CONFIGURE CONTROLFILE AUTOBACKUP OFF; # default

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP ON;

new RMAN configuration parameters:
CONFIGURE CONTROLFILE AUTOBACKUP ON;
new RMAN configuration parameters are successfully stored

1.3 做RMAN全备

执行RMAN全备，运行以下脚本做全库备份(使用controlfile做为catalog)：
run {
allocate channel ch0 type disk;
backup as compressed backupset format '/export/home/oracle/bk_%s_%p_%t' database;
sql 'alter system archive log current';
backup format '/export/home/oracle/cntl_%s_%p_%t' current controlfile;
release channel ch0;
}

1.4 全备后，做一些数据库操作

创建表空间、用户和表:

SQL> create tablespace testtbs datafile '+DATA' size 10m autoextend on; Tablespace created.

SQL> create user test identified by test default tablespace testtbs; User created. SQL> grant connect,resource to test; Grant succeeded. SQL> conn test /test Connected.

SQL> create table testtb (id number); Table created. SQL> insert into testtb values (1); 1 row created. SQL> commit; Commit complete.

切换日志：

SQL> conn /as sysdba

Connected.

SQL> alter system archive log current;

System altered

检查incarnation信息，此时只有一个incarnation：

RMAN> list incarnation;

List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
------- ------- -------- ---------------- ------- ---------- ----------
1 1 BEICENP 1011448374 CURRENT 1 06-NOV-12

检查scn，此scn为后续恢复操作所使用的scn:

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
1304918

1.5 模拟resetlogs

做一次不完全恢复，来模拟resetlogs操作，增加数据库incarnation。

模拟故障，模拟删除数据文件(不删除控制文件)：

SQL> shutdown abort;

使用ASM命令删除数据文件。

做不完全恢复：

启动数据库到mount状态： SQL> startup mount; ORACLE instance started. Total System Global Area 935329792 bytes Fixed Size 2025168 bytes Variable Size 306186544 bytes Database Buffers 620756992 bytes Redo Buffers 6361088 bytes Database mounted. 使用rman 命令恢复： run { allocate channel ch0 type disk; restore database; recover database until scn 1304918; release channel ch0; sql 'alter database open resetlogs'; } 恢复完全后incarnation信息： RMAN> list incarnation; using target database control file instead of recovery catalog List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- -------- ---------------- --- ---------- ---------- 1 1 BEICENP 1011448374 PARENT 1 06-NOV-12 2 2 BEICENP 1011448374 CURRENT 1304920 25-DEC-13 恢复完成后，增加了一个新的incarnation。

1.6 resetlogs后，新创建一个表空间

创建表空间的目的是让oracle自动备份一个resetlogs后的控制文件。

创建另一个表空间：
SQL> create tablespace testtbs_new datafile '+DATA' size 10m autoextend on;

Tablespace created.

表空间创建后，在alert日志里可以看到控制文件自动备份信息：
Starting control autobackup
Wed Dec 25 15:19:46 2013
Control autobackup written to DISK device
        handle '+FLASH/beicenp/autobackup/2013_12_25/s_835111170.276.835111185'
Completed: alter database open resetlogs
Wed Dec 25 15:20:58 2013
create tablespace testtbs_new datafile '+DATA' size 10m autoextend on
Wed Dec 25 15:20:59 2013
Starting control autobackup
Control autobackup written to DISK device
        handle '+FLASH/beicenp/autobackup/2013_12_25/s_835111259.277.835111261'
Completed: create tablespace testtbs_new datafile '+DATA' size 10m autoextend on
在创建表空间前后，都会做控制文件自动备份。

做一次归档：
SQL> alter system archive log current;

System altered.

2. 不完全恢复测试一

模拟故障，删除数据文件和控制文件，不删除flashback area的控制文件自动备份。使用备份控制文件，不完全恢复到SCN 1304918。

2.1 模拟故障，删除数据文件和控制文件

模拟故障：
SQL> shutdown abort;
ORACLE instance shut down.

使用ASM命令删除数据文件和控制文件，注意不要删除flashback area。

2.2 恢复数据库到SCN 1304918

启动到nomount状态：
SQL> startup nomount;
ORACLE instance started.

Total System Global Area  935329792 bytes
Fixed Size                  2025168 bytes
Variable Size             314575152 bytes
Database Buffers          612368384 bytes
Redo Buffers                6361088 bytes

使用以下RMAN脚本恢复数据库：
run {
allocate channel ch0 type disk;
restore controlfile from '/export/home/oracle/cntl_7_1_835109919';
sql 'alter database mount';
restore database;
recover database until scn 1304918;
release channel ch0;
sql 'alter database open resetlogs';
}
此处使用的是备份控制文件，里面不应该包含incarnation2的信息。

以下是RMAN恢复脚本部分输出信息：

输出信息表明控制文件自动备份被Oracle自动执行了catalog. 并且RMAN恢复报错RMAN-20208

searching for all files in the recovery area
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: +flash/BEICENP/AUTOBACKUP/2013_12_25/s_835109928.262.835109931
File Name: +flash/BEICENP/AUTOBACKUP/2013_12_25/s_835110040.263.835110043
File Name: +flash/BEICENP/AUTOBACKUP/2013_12_25/s_835111170.276.835111185
File Name: +flash/BEICENP/AUTOBACKUP/2013_12_25/s_835111259.277.835111261
......
Starting recover at 25-DEC-13
released channel: ch0
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 12/25/2013 16:28:47
RMAN-20208: UNTIL CHANGE is before RESETLOGS change

alert日志中有以下信息，Oracle自动将incarnation由1设置成2.

New incarnation branch detected in Backup, filename +FLASH/beicenp/autobackup/2013_12_25/s_835111170.276.835111185
Inspection of file changed rdi from 1 to 2
Setting recovery target incarnation to 2
Wed Dec 25 16:27:19 2013
Setting recovery target incarnation to 2

出现这个报错的解决方法如下：

reset incarnation :
RMAN> list incarnation;
List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       1       BEICENP  1011448374       PARENT  1          06-NOV-12
2       2       BEICENP  1011448374       CURRENT 1304920    25-DEC-13

RMAN> reset database to incarnation 1;
database reset to incarnation 1

RMAN>  list incarnation;
List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       1       BEICENP  1011448374       CURRENT 1          06-NOV-12
2       2       BEICENP  1011448374       ORPHAN  1304920    25-DEC-13


然后执行不完全恢复
RMAN> recover database until scn 1304918;

Starting recover at 25-DEC-13
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=153 devtype=DISK

starting media recovery

archive log thread 1 sequence 58 is already on disk as file +ARCH/beicenp/archivelog/2013_12_25/thread_1_seq_58.292.835109917
archive log filename=+ARCH/beicenp/archivelog/2013_12_25/thread_1_seq_58.292.835109917 thread=1 sequence=58
unable to find archive log
archive log thread=1 sequence=59
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 12/25/2013 17:28:47
RMAN-06054: media recovery requesting unknown log: thread 1 seq 59 lowscn 1296057

由于备份的控制文件中缺少部分归档日志信息，需要catalog归档日志，然后再做recover
RMAN> catalog start with '+ARCH';

searching for all files that match the pattern +ARCH

List of Files Unknown to the Database
=====================================
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_59.293.835110115
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_60.294.835110233
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_61.295.835111123
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_1.296.835111307
File Name: +arch/BEICENP/ARCHIVELOG/2012_11_06/thread_1_seq_22.256.798651655
File Name: +arch/BEICENP/ARCHIVELOG/2012_11_06/thread_1_seq_23.257.798656015
File Name: +arch/BEICENP/ARCHIVELOG/2012_11_06/thread_1_seq_24.258.798656205
File Name: +arch/BEICENP/ARCHIVELOG/2012_11_06/thread_1_seq_25.259.798656343

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_59.293.835110115
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_60.294.835110233
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_61.295.835111123
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_1.296.835111307
File Name: +arch/BEICENP/ARCHIVELOG/2012_11_06/thread_1_seq_22.256.798651655
File Name: +arch/BEICENP/ARCHIVELOG/2012_11_06/thread_1_seq_23.257.798656015
File Name: +arch/BEICENP/ARCHIVELOG/2012_11_06/thread_1_seq_24.258.798656205
File Name: +arch/BEICENP/ARCHIVELOG/2012_11_06/thread_1_seq_25.259.798656343

RMAN> recover database until scn 1304918;
RMAN> sql 'alter database open resetlogs';
reset incarnation后虽然数据库能够做不完全恢复，但在控制文件却多出了一个ORPHAN的incarnation，这是我不希望看到的。如果在做不完全恢复之前将flashback area清空(前提是备份集没放在flashback area)，就不会出现这个问题。

3. 不完全恢复测试二

模拟故障，删除数据文件和控制文件和flashback area的控制文件自动备份。使用备份控制文件，不完全恢复到SCN 1304918。

3.1 模拟故障，删除数据文件、控制文件和控制文件自动备份

模拟故障：
SQL> shutdown abort;
ORACLE instance shut down.

使用ASM命令删除数据文件、控制文件和flashback area的控制文件自动备份。

3.2 恢复数据库到SCN 1304918

启动到nomount状态：
SQL> startup nomount;
ORACLE instance started.

Total System Global Area  935329792 bytes
Fixed Size                  2025168 bytes
Variable Size             314575152 bytes
Database Buffers          612368384 bytes
Redo Buffers                6361088 bytes

使用以下RMAN脚本恢复备份控制文件：

run {
allocate channel ch0 type disk;
restore controlfile from '/export/home/oracle/cntl_7_1_835109919';
release channel ch0;
sql 'alter database mount';
}

此处使用的是备份控制文件，里面不包含incarnation2的信息:

由于备份控制文件缺少部分归档日志信息，需要catalog归档日志：

RMAN> catalog start with '+ARCH';

Starting implicit crosscheck backup at 26-DEC-13
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=153 devtype=DISK
Crosschecked 6 objects
Finished implicit crosscheck backup at 26-DEC-13

Starting implicit crosscheck copy at 26-DEC-13
using channel ORA_DISK_1
Finished implicit crosscheck copy at 26-DEC-13

searching for all files in the recovery area
cataloging files...
no files cataloged

searching for all files that match the pattern +ARCH

List of Files Unknown to the Database
=====================================
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_59.293.835110115
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_60.294.835110233
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_61.295.835111123
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_1.296.835111307
File Name: +arch/BEICENP/ARCHIVELOG/2012_11_06/thread_1_seq_22.256.798651655

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_59.293.835110115
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_60.294.835110233
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_61.295.835111123
File Name: +arch/BEICENP/ARCHIVELOG/2013_12_25/thread_1_seq_1.296.835111307
File Name: +arch/BEICENP/ARCHIVELOG/2012_11_06/thread_1_seq_22.256.798651655

注意：thead_1_seq_1.296.835111307是resetlogs之后产生的归档日志，不完全恢复并不需要此归档日志。

catalog归档日志后，oracle错误地将incarnation设置为2：

RMAN> list incarnation;
List of Database Incarnations
DB Key Inc Key DB Name DB ID            STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       1       BEICENP 1011448374       PARENT 1          06-NOV-12
2       2       BEICENP 1011448374       CURRENT 1304920    25-DEC-13

同时alert日志中出现reset incarnation信息：

New incarnation branch detected in ArchiveLog, filename +ARCH/beicenp/archivelog/2013_12_25/thread_1_seq_1.296.835111307
Inspection of file changed rdi from 1 to 2
Setting recovery target incarnation to 2
Thu Dec 26 10:28:19 2013
Setting recovery target incarnation to 2

此时出现了与不完全恢复测试一中一样的问题。解决方案是：需要重置incarnation，然后再做恢复。

RMAN> reset database to incarnation 1;

database reset to incarnation 1

然后在RMAN里运行不完全恢复脚本：

run {
allocate channel ch0 type disk;
restore database;
recover database until scn 1304918;
release channel ch0;
sql 'alter database open resetlogs';
}

与测试一一样，reset incarnation后虽然数据库能够做不完全恢复，但在控制文件却多出了一个ORPHAN的incarnation。如果在做catalog归档日志时只catalog需要的归档日志(不要catalog resetlogs之后产生的归档日志)，就不会出现这个问题。

4. 不完全恢复测试三

前2次不完全恢复都是使用备份控制文件，此处使用当前控制文件进行恢复测试。

4.1 模拟故障，删除数据文件和flashback area的控制文件自动备份

模拟故障：
SQL> shutdown abort;
ORACLE instance shut down.

使用ASM命令删除数据文件和flashback area的控制文件自动备份。

4.2使用当前控制文件恢复数据库到SCN 1304918

执行以下恢复脚本：
run {
allocate channel ch0 type disk;
restore database;
recover database until scn 1304918;
release channel ch0;
sql 'alter database open resetlogs';
}

using target database control file instead of recovery catalog
allocated channel: ch0
channel ch0: sid=152 devtype=DISK

Starting restore at 26-DEC-13

creating datafile fno=5 name=+DATA/beicenp/datafile/testtbs.270.835110813
creating datafile fno=6 name=+DATA/beicenp/datafile/testtbs_new.271.835111259
channel ch0: starting datafile backupset restore
channel ch0: specifying datafile(s) to restore from backup set
restoring datafile 00001 to +DATA/beicenp/datafile/system.264.835110833
restoring datafile 00002 to +DATA/beicenp/datafile/undotbs1.262.835110845
restoring datafile 00003 to +DATA/beicenp/datafile/sysaux.261.835110873
restoring datafile 00004 to +DATA/beicenp/datafile/users.260.835110877
channel ch0: reading from backup piece /export/home/oracle/bk_5_1_835109756
channel ch0: restored backup piece 1
piece handle=/export/home/oracle/bk_5_1_835109756 tag=TAG20131225T145555
channel ch0: restore complete, elapsed time: 00:02:47
Finished restore at 26-DEC-13

Starting recover at 26-DEC-13
released channel: ch0
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 12/26/2013 10:46:36
RMAN-20208: UNTIL CHANGE is before RESETLOGS change

重置incarnation:
RMAN> list incarnation;
List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       1       BEICENP  1011448374       PARENT  1          06-NOV-12
2       2       BEICENP  1011448374       CURRENT 1304920    25-DEC-13

RMAN> reset database to incarnation 1;
database reset to incarnation 1

RMAN> list incarnation;
List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       1       BEICENP  1011448374       CURRENT 1          06-NOV-12
2       2       BEICENP  1011448374       ORPHAN  1304920    25-DEC-13

再做恢复：
RMAN> recover database until scn 1304918;

Starting recover at 26-DEC-13
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=152 devtype=DISK

starting media recovery
media recovery failed
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 12/26/2013 10:47:10
ORA-00283: recovery session canceled due to errors
RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed
 start until change 1304918
ORA-00283: recovery session canceled due to errors
ORA-38727: FLASHBACK DATABASE requires a current control file.

同时alert日志里有以下提示信息： 
alter database recover if needed
 start until change 1304918
Media Recovery Start
Cannot mark control file as backup: flashback database enabled

关闭flashback后，再执行recover：
SQL> alter database flashback off;
Database altered.

RMAN> recover database until scn 1304918;
starting media recovery
media recovery failed
released channel: ch0
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 12/26/2013 11:20:01
ORA-00283: recovery session canceled due to errors
RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed
 start until change 1304918
ORA-00283: recovery session canceled due to errors
ORA-00600: internal error code, arguments: [kcvsor_current_inc_rdfail], [0], [], [], [], [], [], []

此时会出现ORA-600错误，数据库无法做不完全恢复。

此测试说明，在做跨resetlogs不完全恢复时，应注意选择控制文件版本。不能使用当前控制文件进行恢复(nocatalog情况)。