Oracle- OCR/VOTE 磁盘意外损坏问题解决方案

本文档适用于想要重用原设备Oracle软件(共享存储已回收),而不是完全重新安装部署方式搭建RAC集群环境的场景。

Goal

The goal of this document is to help customers who have accidentally deleted the OCR, voting disk or the files that are required for the operation of Oracle clusterware.

Depending on the issue, it may or may not be good idea to execute the steps provided.

  • OCR
    • If the OCR has been deleted, then check if the OCR mirror is OK and vice versa. It may be prudent to use the OCR mirror to create the OCR. For steps on this check the documentation: Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide
    • If the OCR mirror and OCR have been deleted, then it may be faster to restore the OCR using the OCR backups. For steps on this check the documentation: Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide
  • Voting Disk
    • If there are multiple voting disks and one was accidentally deleted, then check if there are any backups of this voting disk. If there are no backups then we can add one using the crsctl add votedisk command. The complete steps are in the: Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide
  • SCLS directories
    • These are internal only directories which are created by root.sh, if this directory is accidentally removed then they can only be created by the steps documented below
  • Socket files in /tmp/.oracle or /var/tmp/.oracle
    • If these files are accidentally deleted, then stop the Oracle Clusterware on that node and restart it again. This will recreate these socket files. If the socket files for cssd is deleted then the Oracle Clusterware stack may not come down in which case the node has to be bounced.

解决措施

linux/UNIX 10.2~11.1版本

Oracle Server - Standard Edition - Version: 10.2.0.1 to 11.1.0.7 - Release: 10.2 to 11.1
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.1.0.7 [Release: 10.2 to 11.1]
Generic UNIX
Generic Linux
Oracle Clusterware

详细步骤

If none of the steps documented above can be used to restore the file that was accidentally deleted or is corrupted, then the following steps can be used to re-create/reinstantiate these files. The following steps require complete downtime on all the nodes.

1. 关闭集群服务(所有节点)

Shutdown the Oracle Clusterware stack on all the nodes using command crsctl stop crs as root user.

# su - root
 srvctl stop nodeapps -n <nodename>
 or
 crsctl stop crs
2. 备份软件目录

Backup the entire Oracle Clusterware home

tar -Pczpf app.tgz /oracle/app --exclude=*.trc --exclude=*.trm --exclude=/oracle/app/grid/diag/tnslsnr --exclude=*.aud 
3. 删除init相关配置项

Execute <CRS_HOME>/install/rootdelete.sh on all nodes

su - root
export CRS_HOME='/ups/oracle/cluster/10.2/crs_1'
${CRS_HOME}/install/rootdelete.sh

脚本完成功能:删除inittab文件中CRS相关配置项,停止CRS进程,移除init下相关文件。

4. 移除所有节点的集群配置,执行脚本rootdeinstall.sh

Execute <CRS_HOME>/install/rootdeinstall.sh on the node which is supposed to be the first node

${CRS_HOME}/install/rootdeinstall.sh

完成功能:清除OCR设备信息

5. 检查是否存在遗留服务进程

The following commands should return nothing

ps -e | grep -i 'ocs[s]d'
ps -e | grep -i 'cr[s]d.bin'
ps -e | grep -i 'ev[m]d.bin'

ps -ef|grep -Ei 'ocs[s]d|cr[s]d.bin|ev[m]d.bin'
6. 执行root.sh脚本
  • 先在节点1执行

Execute <CRS_HOME>/root.sh on first node

sh ${CRS_HOME}/root.sh

# node1 运行日志信息如下:
[root@node1 ~]# sh ${CRS_HOME}/root.sh
WARNING: directory '/ups/oracle/cluster/10.2' is not owned by root
WARNING: directory '/ups/oracle/cluster' is not owned by root
WARNING: directory '/ups/oracle' is not owned by root
Checking to see if Oracle CRS stack is already configured
/etc/oracle does not exist. Creating it now.

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/ups/oracle/cluster/10.2' is not owned by root
WARNING: directory '/ups/oracle/cluster' is not owned by root
WARNING: directory '/ups/oracle' is not owned by root
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: node1 node1-priv node1
node 2: node2 node2-priv node2
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Now formatting voting device: /dev/raw/raw3
Now formatting voting device: /dev/raw/raw4
Now formatting voting device: /dev/raw/raw5
Format of 3 voting devices complete.
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
	node1
CSS is inactive on these nodes.
	node2
Local node checking complete.
Run root.sh on remaining nodes to start CRS daemons.


# node2 运行信息:
[root@node2 crs_1]# ${CRS_HOME}/root.sh
WARNING: directory '/ups/oracle/cluster/10.2' is not owned by root
WARNING: directory '/ups/oracle/cluster' is not owned by root
WARNING: directory '/ups/oracle' is not owned by root
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up NS directories
Oracle Cluster Registry configuration upgraded successfully
WARNING: directory '/ups/oracle/cluster/10.2' is not owned by root
WARNING: directory '/ups/oracle/cluster' is not owned by root
WARNING: directory '/ups/oracle' is not owned by root
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: node1 node1-priv node1
node 2: node2 node2-priv node2
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 90 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
CSS is active on these nodes.
	node1
	node2
CSS is active on all nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps
Error 0(Native: listNetInterfaces:[3])
  [Error 0(Native: listNetInterfaces:[3])]
[root@node2 crs_1]#
  • 修改OCR磁盘的权限:oracle:oinstall -> root:oinstall
  • 配置OCR和Voting存储
  • 配置/etc/inittab文件

检查

${CRS_HOME}/bin/ocrcheck
${CRS_HOME}/bin/crsctl query  css votedisk
  • 依次在集群的其它节点执行root.sh脚本

After successful root.sh execution on first node, Execute root.sh on the rest of the nodes of the cluster

sh ${CRS_HOME}/root.sh

7. 配置ONS(Oracle Notification Service) 服务

For 10gR2, use racgons. For 11g use onsconfig command. Examples for each are provided below.

配置文件路径:${CRS_HOME}/opmn/conf/ons.config

  • For 10g

Execute as owner (generally oracle) of CRS_HOME command

<CRS_HOME>/bin/racgons add_config hostname1:port hostname2:port

${CRS_HOME}/bin/racgons add_config node1:6200 node2:6200

  • For 11gR1

Execute as owner (generally oracle) of CRS_HOME command

<CRS_HOME>/install/onsconfig add_config hostname1:port hostname2:port

${CRS_HOME}/install/onsconfig add_config halinux1:6200 halinux2:6200

8. 配置集群网络

Execute as owner of CRS_HOME (generally oracle) <CRS_HOME>/bin/oifcfg setif -global. Please review Note 283684.1 for details.

su - root
${CRS_HOME}/bin/oifcfg setif -global eth1/172.168.10.0:cluster_interconnect eth0/192.168.10.0:public
# 检查
${CRS_HOME}/bin/oifcfg getif

# 执行vipca 配置
unset LD_ASSUME_KERNEL
${CRS_HOME}/bin/vipca -silent -nodelist node1,node2 -nodevips node1/192.168.10.159,node2/192.168.10.160
10. 配置监听服务

Add listener using netca. This may give errors if the listener.ora contains the entries already. If this is the case, move the listener.ora to /tmp from the $ORACLE_HOME/network/admin or from the $TNS_ADMIN directory if the TNS_ADMIN environmental is defined and then run netca. Add all the listeners that were added earlier.

  • 移除旧监听配置

    DB_HOME=/ups/oracle/database/10.2/db_1
    mv ${DB_HOME}/network/admin/listener.ora ${DB_HOME}/network/admin/listener.bak
    
    netca /silent /responseFile netca.rsp
    
  • 重用旧监听配置(所有节点)

    # crs_profile 创建cap文件(默认路径:${DB_HOME}/crs/public/)
    ${CRS_HOME}/bin/crs_profile -create ora.node1.LISTENER.lsnr -t application -a ${CRS_HOME}/bin/racgwrap -d "CRS application for listener on node" -h node1 -r ora.node1.vip -p restricted -o as=1,ci=600,st=600,ra=5
    
    # 注册服务到OCR
    ${CRS_HOME}/bin/crs_register ora.node1.LISTENER.lsnr
    
    # 从OCR移除服务
    ${CRS_HOME}/bin/crs_unregister ora.node1.LISTENER.lsnr
    

cap文件内容

[oracle@RLZYGLXTDB01 public]$ cat ora.RLZYGLXTDB01.LISTENER_RLZYGLXTDB01.lsnr.cap
NAME=ora.RLZYGLXTDB01.LISTENER_RLZYGLXTDB01.lsnr
TYPE=application
ACTION_SCRIPT=/oracle/product/10.2.0/bin/racgwrap
ACTIVE_PLACEMENT=0
AUTO_START=1
CHECK_INTERVAL=600
DESCRIPTION=CRS application for listener on node
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=RLZYGLXTDB01
OPTIONAL_RESOURCES=
PLACEMENT=restricted
REQUIRED_RESOURCES=ora.rlzyglxtdb01.vip
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=600
START_TIMEOUT=0
STOP_TIMEOUT=0
UPTIME_THRESHOLD=7d
USR_ORA_ALERT_NAME=
USR_ORA_CHECK_TIMEOUT=0
USR_ORA_CONNECT_STR=/ as sysdba
USR_ORA_DEBUG=0
USR_ORA_DISCONNECT=false
USR_ORA_FLAGS=
USR_ORA_IF=
USR_ORA_INST_NOT_SHUTDOWN=
USR_ORA_LANG=
USR_ORA_NETMASK=
USR_ORA_OPEN_MODE=
USR_ORA_OPI=false
USR_ORA_PFILE=
USR_ORA_PRECONNECT=none
USR_ORA_SRV=
USR_ORA_START_TIMEOUT=0
USR_ORA_STOP_MODE=immediate
USR_ORA_STOP_TIMEOUT=0
USR_ORA_VIP=
[oracle@RLZYGLXTDB01 public]$ pwd
/oracle/crs/crs/public

说明:

  • AUTO_START的值也可以用0,1,2来表示,其中0 等同always, 1 等同restore, 2 等同never
11. 配置nodeapps服务
# 检查当前已注册服务
${CRS_HOME}/bin/crs_stat -p

# 添加gsd/ons/vip服务到OCR
${CRS_HOME}/bin/srvctl add nodeapps -n node1 -o ${DB_HOME} -A node1-vip/255.255.255.0/eth0

# 检查
${CRS_HOME}/bin/crs_stat -p
${CRS_HOME}/bin/srvctl config nodeapps -n node1 -a -g -o -s -l
12. 配置ASM

Add ASM & database resource to the OCR using the appropriate srvctl add database command as the user who owns the ASM & database resource. Please ensure that this is not run as root user 若原共享存储都需要重新初始化时,使用dbca工具重新安装ASM和DB数据库

  • 利旧ASM

    srvctl add asm -n <node_name> -i <asm_inst_name> -o <oracle_home> [-p <spfile>]
    
    srvctl add asm -n node1 -i +ASM1 -o ${DB_HOME}
    srvctl add asm -n node2 -i +ASM2 -o ${DB_HOME}
    
  • 重新创建ASM

    dbca -configureASM –nodelist node1,node2 –asmSysPassword oracle -diskString '/dev/raw/raw*' -diskList '/dev/raw/raw5' -diskGroupName data -redundancy external
    
13. 配置数据库服务

Add Instance, services using appropriate srvctl add commands. Please refer to the documentation for the exact commands.

  • 利旧DB服务

    srvctl add database -d orcl -o '/ups/oracle/database/10.2/db_1' -p '+DATA/ORCL/spfileorcl.ora'
    
    srvctl add instance -d orcl -i orcl1 -n node1
    srvctl add instance -d orcl -i orcl2 -n node2
    
  • 重新创建DB服务

    dbca -silent -createDatabase -templateName General_Purpose.dbc -gdbName orcl -sid orcl -sysPassword 123456 -systemPassword 123456 -storageType ASM -diskGroupName DATA -datafileJarLocation $ORACLE_HOME/assistants/dbca/templates -nodeinfo node1,node2 -characterset ZHS16GBK -obfuscatedPasswords false -sampleSchema false -asmSysPassword 123456 -databaseType OLTP
    
14. 检查

execute cluvfy stage -post crsinst -n node1,node2 Please ensure to replace node1,node2 with the node names of the cluster

${CRS_HOME}/bin/cluvfy stage -post crsinst -n node1,node2

# 查看集群状态
awk  'BEGIN {printf "%-50s %-10s %-10s %-10s 
","Name                          ","Target    ","State     ","Host   "; printf "%-30s %-10s %-10s %-10s
","------------------------------","----------", "---------","-------";}';
crs_stat | awk 'BEGIN { FS="=| ";state = 0;}  $1~/NAME/ {appname = $2; state=1};  state == 0 {next;}  $1~/TARGET/ && state == 1 {apptarget = $2; state=2;} $1~/STATE/ && state == 2 {appstate = $2; apphost = $4; state=3;} state == 3 {printf "%-50s %-10s %-10s %-10s
", appname,apptarget,appstate,apphost; state=0;}'

附录

参考文档

How to Recreate OCR/Voting Disk Accidentally Deleted [ID 399482.1]

CRS on Windows: How To Reinitialize (Accidentally Deleted) OCR and Vote Disk (without a full reinstall of Oracle Clusterware) (Doc ID 557178.1)

相关工具用法

crs_profile

在Oraclele RAC中,所有的CRS资源存放在OCR磁盘中,crs_profile用于管理集群的配置文件。

使用crs_profile创建资源配置文件。缺省情况下,未指定配置文件的路径时,新创建的资源配置文件位于$ORA_CRS_HOME/crs/public 路径下,并且以.cap后缀结尾。

Usage:  crs_profile -create resource_name -t application
          [-dir directory_path] [-a action_script] [-B binary_pathname]
          [-d description] [-h hosting_members] [-r required_resources]
          [-l optional_resources] [-p placement_policy]
          [-o as=auto_start,ci=check_interval,ft=failure_threshold,
          fi=failure_interval,ra=restart_attempts,fd=failover_delay,
          st=script_timeout,ap=active_placement,bt=rebalance,
          ut=uptime_threshold,rt=start_timeout,pt=stop_timeout] [-f] [-q]

        crs_profile -create resource_name -I template_file [-dir directory_path] [-f] [-q]

        crs_profile -delete resource_name [-dir directory_path] [-q]

        crs_profile -print [resource_name [...]] [-dir directory_path] [-q]

        crs_profile -template resource_name [-dir directory_path] [-O template_file]

        crs_profile -template -t application [-O template_file]

        crs_profile -update resource_name [-dir directory_path] [option ...] [-o option,...] [-q]

        crs_profile -validate resource_name [-dir directory_path] [-q]

crs_register

crs_register命令主要是将资源注册到CRS。该方法通常结合crs_stat -p 或者crs_profile先创建配置文件。同时crs_register也具有更新CRS的功能。

Usage:  crs_register resource_name [-dir directory_path] [...] [-u] [-f] [-q]
        crs_register resource_name -update [option ...] [-o option,...] -q

crs_unregister

Usage: crs_unregister resource_name [...] [-q]

vipca

# 语法
vipca [ -silent ] -nodelist <node1[,..]> -nodevips <node-name/ip-name|ip-addr[/netmask[/interface[|interface-i]][,...]> -vipfile <vipFile path> -orahome <Oracle home path>
Note: vipFile should contain inputs specified one per line per node in the following format: <node-name>=<ip-name|ip-addr>/<netmask>/<interface[|interface-i]>

dbca

dbca  [-silent | -progressOnly | -customCreate] {<command> <options> }  | { [<command> [options] ] -responseFile  <response file > } [-continueOnNonFatalErrors <true | false>]
Please refer to the manual for details.
You can enter one of the following command:

Create a database by specifying the following parameters:
        -createDatabase
                -templateName <name of an existing  template>
                [-cloneTemplate]
                -gdbName <global database name>
                [-sid <database system identifier prefix>]
                [-sysPassword <SYS user password>]
                [-systemPassword <SYSTEM user password>]
                [-emConfiguration <CENTRAL|LOCAL|ALL|NOBACKUP|NOEMAIL|NONE>
                        -dbsnmpPassword <DBSNMP user password>
                        -sysmanPassword <SYSMAN user password>
                        [-hostUserName <Host user name for EM backup job>
                         -hostUserPassword <Host user password for EM backup job>
                         -backupSchedule <Daily backup schedule in the form of hh:mm>]
                        [-smtpServer <Outgoing mail (SMTP) server for email notifications>
                         -emailAddress <Email address for email notifications>]
                        [-centralAgent <Enterprise Manager central agent home>]]
                [-datafileDestination <destination directory for all database files> |  -datafileNames <a text file containing database objects such as controlfiles, tablespaces, redo log files and spfile to their corresponding raw device file names mappings in name=value format.>]
                [-recoveryAreaDestination <destination directory for all recovery files>]
                [-datafileJarLocation  <location of the data file jar, used only for clone database creation>]
                [-storageType < CFS | ASM | RAW>
                        [-asmSysPassword     <SYS password for ASM instance>]
                        [-diskString      <disk discovery path to be used by ASM>]
                        [-diskList        <comma seperated list of disks for the database area disk group>
                         -diskGroupName   <database area disk group name>
                         -redundancy      <HIGH|NORMAL|EXTERNAL>]
                        [-recoveryDiskList        <comma seperated list of disks for the recovery area disk group>
                         -recoveryGroupName       <recovery area disk group name>
                         -recoveryGroupRedundancy <HIGH|NORMAL|EXTERNAL>]]
                [-nodelist <node names separated by comma for the database>]
                [-characterSet <character set for the database>]
                [-nationalCharacterSet  <national character set for the database>]
                [-registerWithDirService <true | false>
                        -dirServiceUserName    <user name for directory service>
                        -dirServicePassword    <password for directory service >
                        -walletPassword    <password for database wallet >]
                [-listeners  <list of listeners to configure the database with>]
                [-variablesFile   <file name for the variable-value pair for variables in the template>]]
                [-variables  <comma seperated list of name=value pairs>]
                [-initParams <comma seperated list of name=value pairs>]
                [-memoryPercentage <percentage of physical memory for Oracle>]
                        [-databaseType <MULTIPURPOSE|DATA_WAREHOUSING|OLTP>]]

Configure a database by specifying the following parameters:
        -configureDatabase
                -sourceDB    <local instance_name of source database>
                [-sysDBAUserName     <user name  with SYSDBA privileges>
                 -sysDBAPassword     <password for sysDBAUserName user name>]
                [-registerWithDirService|-unregisterWithDirService|-regenerateDBPassword <true | false>
                        -dirServiceUserName    <user name for directory service>
                        -dirServicePassword    <password for directory service >
                        -walletPassword    <password for database wallet >]
                [-emConfiguration <CENTRAL|LOCAL|ALL|NOBACKUP|NOEMAIL|NONE>
                        -dbsnmpPassword <DBSNMP user password>
                        -symanPassword <SYSMAN user password>
                        [-hostUserName <Host user name for EM backup job>
                         -hostUserPassword <Host user password for EM backup job>
                         -backupSchedule <Daily backup schedule in the form of hh:mm>]
                        [-smtpServer <Outgoing mail (SMTP) server for email notifications>
                         -emailAddress <Email address for email notifications>]
                        [-centralAgent <Enterprise Manager central agent home>]]


Create a template from an existing database by specifying the following parameters:
        -createTemplateFromDB
                -sourceDB    <service in the form of <host>:<port>:<sid>>
                -templateName      <new template name>
                -sysDBAUserName     <user name  with SYSDBA privileges>
                -sysDBAPassword     <password for sysDBAUserName user name>
                [-maintainFileLocations <true | false>]


Create a clone template from an existing database by specifying the following parameters:
        -createCloneTemplate
                -sourceSID    <local instance_name of source database>
                -templateName      <new template name>
                [-sysDBAUserName     <user name  with SYSDBA privileges>
                 -sysDBAPassword     <password for sysDBAUserName user name>]
                [-maintainFileLocations <true | false>]
                [-datafileJarLocation       <directory to place the datafiles in a compressed format>]

Generate scripts to create database by specifying the following parameters:
        -generateScripts
                -templateName <name of an existing  template>
                -gdbName <global database name>
                [-scriptDest       <destination for all the scriptfiles>]

Delete a database by specifying the following parameters:
        -deleteDatabase
                -sourceDB    <source database global database name>
                -sid    <local instance_name of source database>
                [-sysDBAUserName     <user name  with SYSDBA privileges>
                 -sysDBAPassword     <password for sysDBAUserName user name>]

Configure ASM DiskGroups by specifying the following parameters:
        -configureASM
                [-asmSysPassword   <SYS password for ASM instance>]
                [-diskString    <disk discovery path to be used by ASM>]
                [-diskList      <comma seperated list of disks for the database area disk group>
                 -diskGroupName <database area disk group name>
                 -redundancy    <HIGH|NORMAL|EXTERNAL>]]
                [-recoveryDiskList        <comma seperated list of disks for the database area disk group>
                 -recoveryGroupName       <database area disk group name>
                 -recoveryGroupRedundancy <HIGH|NORMAL|EXTERNAL>]
                [-emConfiguration <CENTRAL|NONE>
                 -centralAgent <Enterprise Manager central agent home>]]

Add an instance to a cluster database by specifying the following parameters:
        -addInstance
                -gdbName <global database name>
                -nodelist <node name for the new instance to add>
                [-instanceName <instance name for the new instance to add>]
                [-sysDBAUserName     <user name  with SYSDBA privileges>]
                 -sysDBAPassword     <password for sysDBAUserName user name>
                [-updateDirService <true | false>
                        -dirServiceUserName    <user name for directory service>
                        -dirServicePassword    <password for directory service >]

Delete an instance from a cluster database by specifying the following parameters:
        -deleteInstance
                -gdbName <global database name>
                -instanceName <instance name for the instance to be removed>
                [-nodelist <node name for the instance to be removed>]
                [-sysDBAUserName     <user name  with SYSDBA privileges>]
                 -sysDBAPassword     <password for sysDBAUserName user name>
                [-updateDirService <true | false>
                        -dirServiceUserName    <user name for directory service>
                        -dirServicePassword    <password for directory service >]
Query for help by specifying the following options: -h | -help
原文地址:https://www.cnblogs.com/binliubiao/p/13684799.html