Chapter06Backup and Recovery Overview

Backup and Recovery Issues [备份与恢复所关心的问题]

  • Protect the database from numerous types of failures
  • Increase Mean-Time-Between-Failures(MTBF) [平均无故障时间]
  • Decrease Mean-Time-To-Reconver(MTTR) [平均抢修时间]
  • Minimize data loss

Categories of Failures

  • Statement failure(sql 语句
    • Causes of Statement Failures
      • Logic error in an application
      • Attempt to enter invalid data into the table
      • Attempt an operation with insufficient privileges
      • Attempt to create a table but exceed allotted quota limits
      • Attempt an INSERT or UPDATE to a table,causing an extent to be allocated,but with insufficient free space available in the tablespace.
    • Resolutions
      • Correct the logcial flow of the program
      • Modify and reissue the SQL statement
      • Provide the necessary database privileges
      • Change the user`s quota limit by using the ALTER USER command
      • Add file space to the tablespace
      • Enable resumable space allocation  
  • User process failure(即:客户端程序错误)
    • Causes User process failure
      • The user performed an abnormal disconnect in the session
      • The user`s session was abnormally terminated
      • The user`s program raised an address exception,which terminated the session
    • Resolutions
      • The PMON process detects an abnormally terminated user process
      • PMON rolls back the transaction and releases any resources and locks being held by it  
  • User error
    • Cause User error
      • DROP TABLE employees;
      • TRUNCATE TABLE employees;
      • DELETE FROM employees;COMMIT;
      • UPDATE employees SET salary = salary * 1.5;COMMIT;
    • Resolution
      • Train the database user
      • Recover from a valid backup
      • Import the table from an export file
      • Use LogMiner to determin the time of error(LogMiner是Oracle提供的一项工具;)
      • Recover with a point-in-time recovery
      • Use LogMiner to perform object-level recovery
      • Use FlashBack to view and repair historical data(FlashBack是Oracle提供的一项工具;)
  • Network failure 
  • Instance failure
    • Instance的本质,就是一堆进程加若干个内存块,最大的内存块就是SGA
    • 最典型的Instance failure就是断电
    • SGA包含两大部分1、Data Buffer Cach;2、Redo Log Buffer Cash;
    • Redo Log File里面的内容时实时更新的,Data Filel里的内容时滞后更新的.
    • Cause User error
    • Recovery from Instance Failures
      • No special recovery action is needed from DBA
      • Start the instance
      • Wait for the "database opened" notification
      • Notify users
      • Check the alert log to determine the reason for the failure
      • SMON进程每次在Instance启动的时候,就会检测Data File与Redo Log File 是否是synchronized,如果不同步,说明需要将Redo Log File里面的内容更新到Data File里,也需要将Undo表空间里的数据同步更新到Data File里面,确保Data File与Redo Log File及Undo Tablespace里面的数据同步.对于Instance Failure来讲,DBA不需要进行手工干预.DBA需要做的就是重新启动Instance,然后等待Instance自我修复(nomounted->mounted->opened).这个过程是SMON进程在操作.
  • Media failure(磁盘、介质、磁带错误)
    • Causes of Media Failures
      • Head crash on a disk drive
      • Physical problem in reading from or writing to database files
      • File was accidentally erased
    • Resolutions
      • The recovery strategy depends on which backup method was chosen and which files are affected.
      • If available,apply archived redo log files to recover data committed since the last backup.
  • Defining a Strategy
    • Business requirements
      • Mean time to recover
      • Mean time between failure
      • Evolutionary process
    • Operational requirements
      • 24-Hour operations
      • Testing and validating backups
      • Database volatility
    • Technical considerations
      • Resources:hardware,software,manpower,and time
      • Physical image copies of the operating system files
      • Logical copies of the objects in the database
      • Database configuration
      • Transaction volume that affects desired frequency of backups
    • Disaster Recovery issues
      • How will your business be affected in the event of a major disater,such as:
        • Earthquake,flood,or file
        • Complete loss of machine
        • Malfunction of storage hardware of software
        • Loss of key personnel,for example the database administrator
      • Do you have a plan for testing your strategy periodically?
    • Management concurrence
    • Summary
      • In this lesson,you should have learned how to:
        • Evaluate potential failures in your enviroment
        • Develop a strategy dicated by business,operational,and technical requirements
        • Consider a test plan for a backup and recovery strategy
原文地址:https://www.cnblogs.com/arcer/p/2963547.html