Instance and Media Recovery Structures

Server process : 独立的 server, shared server.

以上结构中:真正存放在磁盘上的物理文件是图的下半部分，那么它们具体存放在哪里的，一般来讲:

contrl file, redo log file, data file: 存放在 oraData 这个目录下

oracle软件一般单独存放在 : $ORACLE_HOME/product 这个目录下

在 $ORACLE_HOME 这个目录里还有个很重要的文件夹, admin 顾名思义，这个文件夹是用来管理的

在 admin 里边有以下内容:

参数 audit_file_dest = $ORACLE_BASE\admin\ADUMP

参数 backgroup_dump_dest = $ORACLE_BASE\admin\BDUMP

参数 user_dump_dest = $ORACLE_BASE\admin\UDUMP

参数 core_dump_dest = $ORACLE_BASE\admin\CDUMP

其中：

adump : 审计

bdump : 后台 trace 和 alert log

cdump : core trace, 只有数据库出问题时，这个目录才有文件，一般不用看。

pfile ：初始化参数文件 init，这并不是真正的初始化参数文件的地方.

udump: 用户进程跟踪文件

---------------------------------------------------------------------------

dbs 目录下，其中:

真正初始化参数文件所在位置 : $ORACLE_HOME\dbs目录

password file位置 : $ORACLE_HOME\dbs ( 密码文件查找顺序: orapw<sid>-- orapw -- Failure(错误)

archivelog file: 你可以自己制定位置和格式

I/O server processes: dbwn 异步写，有些操作系统不支持异步写，这些进程是为 dbwn 来服务的，模拟异步存储

如果你不显示指定 large pool(设置 large_pool_size 参数), large pool 会占用 SGA.(从shared pool 中分配)

large pool 用处, 当执行 RMAN 是可以使用 large pool 的内存, 另外就是dbwn 异步写的时候.

Large Pool Parameters

LARGE_POOL_SIZE: if this parameter is not set, then there is no large pool. The specified size of memory is allocated from the SGA.

select * from v$sgastat where pool = 'large pool'; -- 看到没, 是从 v$sgastat 中查看 large pool 的大小.

DBWR_IO_SLAVES: This parameter specifies the number of I/O slaves used by the DBWn process. The DBWn process and its slaves always write to disk. By default, the value is 0 and I/O slaves are not used.

BACKUP_TAPE_IO_SLAVES: It specifies whether I/O slaves are used by the Recovery Manager to backup, copy, or restore data to tape.

读数据是由 server process 读( server process 可以直接跟读磁盘内容 )， dbwn 的主要作用是写数据

DBWn background process: writes the dirty buffers from the database buffer cache to the data files.

你可以指定多个 DBWn 来使操作的性能提高, 例如 DBW1, DBW2 …

当然会有 uncommited data 存储在datafile 中, 例如: 你创建一个新的 table, 并且数据库的Buffer cache 是 100M, 现在你开始插入数据, 一次性插入1000W条数据, 那么此时由于 buffer cache 大小不够(假如说), 那么就的把数据存储在 datafile中, 而此时的数据时 uncommited, 同样, redo log file 也是一样, 并不一定非的等待用户的commit之后才将redo log buffern 中的内容写入磁盘, 记住, redo log 只是记录对数据库修改的每一条记录, 它的作用只是用来恢复的, 跟实际数据写入磁盘没有多大关系, 以前理解错误了, 以为所有的数据都要通过commit以后才会被写入磁盘, 另外, server process 修改数据的内容都是在内存的 buffer cache 中修改的, 如果 buffer cache 中没有你想要的数据, 那么server process 会去磁盘里读, 但是放到哪里呢? 当然是内存, 计算机除了内存, 还能在哪处理数据呢 ?

对数据库的任何改变信息，都存储在 redo log buffer(一会一存储, 不比一定要等待commit, 比如再commit之前, redo log buffer 1/3满了, 那么就将 redo log buffer 中的内容写入了数据库, 另外, 这个操作是将所有的 redo log buffer 中的内容都写入了 online redo log file, 其中包括uncommit的)

commit 并不是真正的将数据写到 datafile里，而是将 redo log buffer 的相应数据写到了 redo 磁盘中

undo , index, data 修改时，都会在 redo log buffer 中体现

以上是 commit

Before DBWn writes modified block 之前要把 redo log buffer 中的内容写入磁盘。

联机重做日志要用最快的磁盘(频繁的写)

checkpoint : 就是把 database buffer 中的脏数据真正将的写入数据文件中 (磁盘)

检查点绝对恢复从哪个位置开始( checkpoint )

checkpoint position: 是存储在连接重做日志中的一个点，在这个点之前的，redo log file 和 data file 都写入完成了(貌似这时才实现一个同步, 确实)

Checkpoints synchronize the buffer cache by writing all buffers to disk whose corresponding redo entries were part of the log file being checkponted.

Checkpoint Process(CKPT) Features

The CKPT process is always enabled.
The CKPT process updates file headers at checkpoint completion.
More frequent checkpoints reduce the time needed for recovering from instance failure at the possible expense of performance.

另外, 之前理解错误是, 检查点是一个干净的点, 就是恢复可以参考检查点的, 但是, 实际上, 检查点只是将所有的 data buffer cache 中的内容写入磁盘, 注意此时并非是干净的, 因为还有很多 uncommited 数据被同时写入磁盘, 不过此时应该说是同步的, 因为检查点之前会触发 redo log switch, 这样, 貌似 DBWn 与 LGWR 同步了

因为检查点是同步的, 所以检查点以前的内容肯定是同步的, 所以如果有检查点的话, 恢复的时候, 从最后一个检查点开始就可以了, 然后根据 redo 和 archivelog 等或者根据backup文件来恢复.

Synchronization 同步的

At each checkpoint, the checkpoint number is updated in every database file header and in the control file.
The checkpoint number acts as a synchronization marker for redo, control, and datafiles. if they have the same checkpoint number, the database is considered to be in a consistent state.
Information in the control file is used to confirm that all files are at the same checkpoint number during database startup, Any incosistency between the checkpoint numbers in the various file headers results in a failure, and the database cannot be opened. Recovery is required.

Instance Recovery

Checkpints expedite instance recovery because at every checkpoint all changed data is written to a disk. After data resides in datafiles. redo log entires before the last checkpoint need not be applied again during the roll forward phase of instance recovery.

checkpoint queue: 在内存中的，一个队列，脏数据，需要写到数据磁盘，这里记录的每条数据主要也是地址，例如 databuffer cache中的脏数据的地址， redo log file 等等相应的地址，另外这些条数据构成了队列，按照这个队列的顺序，来进行对磁盘的修改。

检查点类型

1 全类型：所有的脏数据都写到数据文件中，一般发生在 shutdown 等等

2 增量类型：把很大的 checkpoint 分成很小的多个 checkpoint 频繁的发生 check point.

3 partial : 对表空间操作时，与表空间相关的脏数据，都会写入磁盘数据文件中

redo log file : 是循环使用，media recovery 对原来的 redo log file 还是要使用的，所以如果被覆盖了，就很难再做 media recovery, 所以，会将该内容写在 Archived log files 中

redo log file 被覆盖之前，必须被归档

data guard : 就是将 archive log file 写到远端

An Oracle database cannot be opened unless all datafiles, redo logs, and control files are synchronized. In this case, recovery is required.

For the database to open, all datafiles must have the same checkpoint number, unless they are offline or part of a read-only tablespace.

Archived and online redo log files recover committed transactions and roll back uncommitted transactions to synchronize the database files.

Archived and online redo log files are automatically requested by the Oracle server during the recovery phase. Make sure logs exist in the requested location.

注意: undo 这部分内容是在磁盘上的, 而非在内存中.

例如: 断电了

1. 发生了不同步现象, 例如突然断电了

2. 先是向前滚, 即将所有的在 redo log files 从上一个最后的检查点之后, 重做一遍. (这个操作同样会产生 Undo 或 rollback 数据 )

3. 现在 data file 中既包括 commit , 又包括 uncommit 的数据, (因为是按照redo log file 全部操作了一遍, 所以肯定有uncommit数据)

4. 回滚阶段, 回滚那些没有被commit的内容

5. 现在 datafile 中只包括那些 commit 数据了.

前滚阶段(Roll forward phase)

During the roll forward phase, Oracle replays transactions in the online redo log beginning with the checkpoint position. The checkpoint position in the place in the redo log where changes associated with previous redo entries had been saved to the datafiles before the failure(Each data file, in its header, has a checkpoint structure the contents gets incremented every time LGWR issures a checkpoint to the DBWR. The checkpoint structure has two structures checkpoint counter and SCN) At the coclusion of roll forward phase, the data files contain all committed changes, as well as new uncommitted changes(applied during roll forward) 也就是在重做日志文件的同时也会产生新的undo, 这个就是所谓的, 新的undo segment, 而同时, 由于系统没有failure 以前, 也有很多 undo segment 内容, 这部分内容是旧的 segment 内容, 注意 undo 里的内容.

回滚阶段

During the rollback phase, Oracle searches out changes associated with dead transactions that bad not commited before the failure occurred.(注意: 这里只查找 failure 发生以前的内容,(旧的undo segment) 而不是 undo segment 的全部内容, 因为 undo segment 中还包含新的undo segment, 而这部分内容是不需要回滚的).这步完成以后, oracle database 就同步了, 就可以被 open 了.

When the database is opened a start SCN is recorded in the control file for every data file associated in the database and a stop SCN is set to infinity.

During normal dtabase operation’s. The SCN and the checkopint counter, information in the data file header is incremented every time a checkpoint is done.

When the database is shutdown with the normal or immediate option, an end SCN is recorded in the data file header, This information is also recorded in the control files. for example : end SCN of the datafile is equal to the stop SCN of the control file.

When database is opened the next time, Oracle makes two checks:

if end SCN in the data file header matches its corresponding stop SCN in the control file.
if checkpoint in the data file header matches its corresponding checkpoint in the control file.

Let us say you shutdown the database in ABORT mode:

checkpoint is not performed and the stop SCN in the control file is left at infinity(the same state when you started or opened your data files)
For example end SCN in the datafile header is “1000” and stop SCN in the control file is “infinity”

in this case Oracle performs crash recovery and as a part of crash, Oracle reads the on-line redo log files and applies the changes to the database as a part of the roll forward and reads the rollback segment’s transaction table to perform transaction recovery(rollbackward)

调优恢复的过程

系统指标: 性能，可用性

看来, 如果检查点频繁的出现的话, 会改善 recovery , 因为操作是从上一个检查点开始的.

user-specified bounds: 用户指定指标

最新版本的 oracle, 提供了一个新的参数用来配置该内容，就是第1个参数 FAST_START_MTTR_TARGET

我们系统设置的是 FAST_START_MTTR_TARGET= 300

所以，只要这个参数就可以了。这个值设置的不恰当，比如1秒，那么oracle会根据自己系统的限制来计算出来合适的值

做完了重做日志重做后， oracle 数据库就 open了。还有 undo 没有回滚呢

增加进程，来处理，SMON指挥一些别的进程帮助处理

转载

在ITPUB 论坛上看到一个有关实例恢复时前滚（roll forword）和回滚（roll back）的讨论。在这里小整理一下，也理理自己的一个思路。

一. 什么时候需要实例恢复

在shutdown normal or shutdown immediate下，也就是所谓的clean shutdown，checkpoint也会自动触发，并且把SCN纪录写回。当发生checkpoint时，会把SCN写到四个地方：

三个地方于control file内：

（1）SYSTEM CHECKPOINT SCN

（2）Datafile checkpoint SCN

（3）Stop SCN

一个在datafile header内：

Start SCN

1.1 Clean shutdown 时

当clean shutdown 时，checkpoint会进行，并且此时datafile的stop scn和控制文件里的start scn会相同。等到open数据库时，Oracle检查datafile header中的start scn和存于control file中的datafile的scn是否相同，如果相同，接着检查start scn和stop scn是否相同，如果仍然相同，数据库就会正常开启，否则就需要recovery。

等到数据库开启后，储存在control file中的stop scn就会恢复为NULL值，此时表示datafile是open在正常模式下了。

1.2 非正常shutdown

如果不正常SHUTDOWN (shutdown abort)，则mount数据库后，会发现stop scn并不是等于其它位置的scn，而是等于NULL，这表示Oracle在shutdown时没有进行checkpoint，下次开机必须进行crash recovery（实例恢复）。

注意一点：

（1）启动数据库时，如果发现STOP SCN = NULL，表示需要进行crash recovery；

（2）启动数据库时，如果发现有datafile header的START SCN 不等于储存于CONTROLFILE的DATAFILE SCN，表示需要进行Media recovery

1.3 crash recovery 顺序问题

必须先进行roll forward(从redo log file中从目前的start SCN开始，重做后面的已提交之交易)。再从roll back segment 做rollback未完成(dead transaction)交易。检验controlfile中的SCN会等于datafile header的SCN

二. Crash Recovery 过程

当数据库突然崩溃，而还没有来得及将buffer cache里的脏数据块刷新到数据文件里，同时在实例崩溃时正在运行着的事务被突然中断，则事务为中间状态，也就是既没有提交也没有回滚。这时数据文件里的内容不能体现实例崩溃时的状态。这样关闭的数据库是不一致的。

下次启动实例时，Oracle会由SMON进程自动进行实例恢复。实例启动时，SMON进程会去检查控制文件中所记录的、每个在线的、可读写的数据文件的END SCN号。

数据库正常运行过程中，该END SCN号始终为NULL，而当数据库正常关闭时，会进行完全检查点，并将检查点SCN号更新该字段。

而崩溃时，Oracle还来不及更新该字段，则该字段仍然为NULL。当SMON进程发现该字段为空时，就知道实例在上次没有正常关闭，于是由SMON进程就开始进行实例恢复了。

SMON进程进行实例恢复时，会从控制文件中获得检查点位置。于是，SMON进程到联机日志文件中，找到该检查点位置，然后从该检查点位置开始往下，应用所有的重做条目，从而在buffer cache里又恢复了实例崩溃那个时间点的状态。这个过程叫做前滚，前滚完毕以后，buffer cache里既有崩溃时已经提交还没有写入数据文件的脏数据块，也还有事务被突然终止，而导致的既没有提交又没有回滚的事务所弄脏的数据块。

前滚一旦完毕，SMON进程立即打开数据库。但是，这时的数据库中还含有那些中间状态的、既没有提交又没有回滚的脏块，这种脏块是不能存在于数据库中的，因为它们并没有被提交，必须被回滚。打开数据库以后，SMON进程会在后台进行回滚。

有时，数据库打开以后，SMON进程还没来得及回滚这些中间状态的数据块时，就有用户进程发出读取这些数据块的请求。这时，服务器进程在将这些块返回给用户之前，由服务器进程负责进行回滚，回滚完毕后，将数据块的内容返回给用户。

三. 为什么数据库的实例恢复是先前滚再回滚

回滚段实际上也是以回滚表空间的形式存在的，既然是表空间，那么肯定就有对应的数据文件，同时在buffer cache 中就会存在映像块，这一点和其他表空间的数据文件相同。

当发生DML操作时，既要生成REDO（针对DML操作本身的REDO Entry）也要生成UNDO（用于回滚该DML操作，记录在UNDO表空间中），但是既然UNDO信息也是使用回滚表空间来存放的，那么该DML操作对应的UNDO信息（在BUFFER CACHE生成对应中的UNDO BLOCK）就会首先生成其对应的REDO信息（UNDO BLOCK's REDO Entry）并写入Log Buffer中。

这样做的原因是因为Buffer Cache中的有关UNDO表空间的块也可能因为数据库故障而丢失，为了保障在下一次启动时能够顺利进行回滚，首先就必须使用REDO日志来恢复UNDO段（实际上是先回复Buffer Cache中的脏数据块，然后由Checkpoint写入UNDO段中），在数据库OPEN以后再使用UNDO信息来进行回滚，达到一致性的目的。

生成完UNDO BLOCK's REDO Entry后才轮到该DML语句对应的REDO Entry，最后再修改Buffer Cache中的Block，该Block同时变为脏数据块。

实际上，简单点说REDO的作用就是记录所有的数据库更改，包括UNDO表空间在内